Variational Bayes is known to be less reliable than MCMC; my previous analysis (compare_models.Rmd) confirms that variational Bayes is less reliable than we would like.
So the next step is to try the model again in MCMC. I’ll repeat the previous analysis, in which I repeatedly tested models with a couple of different parameters, but this time, run it with both MCMC and variational Bayes.
We want to answer the questions:
One point which arose during this period was how much I had tried to optimize variational Bayes. It might be that the iterations I used were not enough. The default number of iterations for variational Bayes is 10000, but we had been using only 1000.
For an initial pilot analysis using the “fastDebug” option I created which does just 100 iterations, the analysis duration was 1000 seconds. For 4000 iterations (for MCMC), we can expect 40 times that duration, or about 11 hours.
In the last comparison I tested out four different models. For this one, I’ll keep it simple, and just compare double_update.stan, which processes only one run of reward or punishment data. The reason for this is that preliminary testing showed the full model was not feasible to run on our system using MCMC (see later for details).
We’ll run each model three times, as for the previous comparison. This time we will be using MCMC rather than variational bayes to do the estimation. We will also (as last time) run this for both Group 2 and Group 3. All up, two groups, with two models, run three times, using two different estimation techniques, this will require 24 repetitions.
In the backend, this has required some change to my stan wrapper function:
# source('../compare_vb_and_MCMC_doubleUpdateOnly.R') we never properly
# completed these results, and the notrialposteriors ended up being simpler
# and probably demonstrates the same principle. we can't /don't want to run
# the full model with posteriors, so we have to choose. We're running the
# notrialposteriors now and this will give us an opportunity to compare to
# vb.
source("compare_notrialposteriors_full.R")
## [1] "running on MSMServer"
## [1] "Loading cached version of this."
It is important to optimize calculating models in RStan because it is very computationally intensive. Initially, we were calculating trial posteriors for each subject and trial in order to plot a predicted time course for every subject. I tried our stan model script without that and observed a marked improvement in processing time.
speeds <- as.data.frame(t(sapply(model.summaries, function(x) {
return(cbind(x$m, x$g, length(x$summaryObj$iter), x$t, x$elapsedTime))
})), stringsAsFactors = FALSE)
colnames(speeds) <- c("model", "subjectgroup", "estimation method", "testiteration",
"speed")
speeds$model <- sapply(speeds$model, function(x) {
return(sub("double_update$", "Basic double update with trial posteriors",
x))
})
speeds$model <- gsub("^double_update_notrialpost$", "Basic double update without trial posteriors",
speeds$model)
speeds$`estimation method` <- gsub("24000", "MCMC", gsub("3992", "vb", as.character(speeds$`estimation method`)))
speeds <- data.table(speeds)
# speeds[,range:=,by=.(model)]
speeds.agg <- speeds[, .(mean = round(mean(as.numeric(speed)), 0), range = paste0(round(min(as.numeric(speed)),
0), " to ", round(max(as.numeric(speed)), 0), " s")), by = .(model, `estimation method`)]
knitr::kable(speeds.agg)
| model | estimation method | mean | range |
|---|---|---|---|
| Basic double update with trial posteriors | vb | 116 | 98 to 131 s |
| Basic double update with trial posteriors | MCMC | 13092 | 1613 to 20055 s |
| Basic double update without trial posteriors | vb | 13 | 8 to 19 s |
| Basic double update without trial posteriors | MCMC | 2517 | 451 to 10095 s |
| Across three iterations and two subject groups, | calculated independ | ently, t | he basic double update model was consistently calculated much faster using Variational Bayes, and generally faster when not calculating trial posteriors. |
If Variational Bayes is just as reliable as MCMC for this problem, there would seem to be no reason to use MCMC, given Variational Bayes’ speed. Considering the difference between calculating without trial posteriors and calculating with them, using Variational Bayes appears to be much faster.
As can be seen across estimation methods and subject groups, calculation was much faster when not calculating trial posteriors, so calculating these shoul be avoided where it is not desirable.
# arrange all the data into a single data table.
model.summary.all <- NULL
for (ms.i in 1:length(model.summaries)) {
ms <- model.summaries[[ms.i]]
ms.summaryObj <- ms$summaryObj
ms.summaryObj$TestId <- ms.i
ms.summaryObj$Group <- ms$g
ms.summaryObj$ModelName <- ms$m
ms.summaryObj$AnalysisRepetition <- ms$t
# because when we ran this, we hadn't explicitly recorded estimation
# methods; but these are distinguishable by the number of iterations.
if (length(ms$summaryObj$iter) == 3992) {
ms.summaryObj$EstimationMethod = as.character(ESTIMATION_METHOD.VariationalBayes)
} else if (length(ms$summaryObj$iter) == 24000) {
ms.summaryObj$EstimationMethod = as.character(ESTIMATION_METHOD.MCMC)
}
if (is.null(model.summary.all)) {
model.summary.all <- ms.summaryObj
} else {
model.summary.all <- rbind(model.summary.all, ms.summaryObj)
}
}
model.summary.all$EstimationMethod <- factor(model.summary.all$EstimationMethod)
source("visualization/geom_hdi.R")
m.mu.run1 <- model.summary.all[Motivation == "Punishment" & Statistic == "mu" &
Run == 1]
# table(m.reward.mu.run1$ModelName) for clarity's sake...
m.mu.run1$ModelName <- sub("^double_update$", "with trial posteriors", m.mu.run1$ModelName)
m.mu.run1$ModelName <- sub("^double_update_notrialpost$", " without trial posteriors",
m.mu.run1$ModelName)
# plotly::ggplotly(p)
ggplot(m.mu.run1[Parameter == "alpha"], aes(x = Value, fill = factor(ModelName),
color = factor(ModelName))) + geom_freqpoly(alpha = 0.5, binwidth = 0.001) +
geom_hdi(size = 2, lineend = "round", alpha = 0.5, credible_mass = 0.95) +
coord_cartesian(xlim = c(0, 0.5), ylim = c(0, 80)) + facet_grid(AnalysisRepetition ~
EstimationMethod + Group, scales = "fixed") + labs(title = paste0("mu statistic in punishment rounds, alpha")) +
scale_color_discrete(guide = guide_legend(title = "Double Update Model..."))
TRUE
## [1] 0.1791246 0.3955309
## [1] 0.1762422 0.3995401
## [1] 0.1039649 0.2827806
## [1] 0.1077669 0.2815501
## [1] 0.188777 0.235891
## [1] 0.191550 0.235939
## [1] 0.171925 0.222923
## [1] 0.173164 0.224515
## [1] 0.1759508 0.3995315
## [1] 0.1770067 0.4018701
## [1] 0.1082205 0.8546446
## [1] 0.1078411 0.2790810
## [1] 0.227383 0.277515
## [1] 0.226681 0.276147
## [1] 0.127602 0.175926
## [1] 0.125440 0.170972
## [1] 0.1827635 0.4061411
## [1] 0.1787257 0.4079194
## [1] 0.1033157 0.2793651
## [1] 0.1014815 0.2775301
## [1] 0.214157 0.266097
## [1] 0.215918 0.265703
## [1] 2.71836e-13 1.52072e-01
## [1] 3.65085e-15 1.96206e-01
We set consistent speeds for each model configuration - randomization seeds only differed between iterations. Figure @ref(fig:VisualizeIterationConsistency) shows, as we would expect for a correctly specified model and consistently set seeds, that including trial posteriors generally made little difference to model estimations. Each model does show some difference - likely random variation - and in once case (the second iteration using MCMC to calculate a model for Group 2), the model without trial posteriors produced a long tail not observed in the model with trial posteriors.
Subsequent graphs in this section will present only results when not calculating posteriors, unless otherwise specified.
source("visualization/geom_hdi.R")
m.mu.run1 <- model.summary.all[Motivation == "Punishment" & Statistic == "mu" &
Run == 1]
# table(m.reward.mu.run1$ModelName) for clarity's sake...
# m.mu.run1$ModelName<-sub('^double_update$','DU with trial
# posteriors',m.mu.run1$ModelName)
# m.mu.run1$ModelName<-sub('^double_update_notrialpost$',' DU without trial
# posteriors',m.mu.run1$ModelName) plotly::ggplotly(p)
ggplot(m.mu.run1[Parameter == "alpha" & ModelName == "double_update_notrialpost"],
aes(x = Value, fill = factor(EstimationMethod), color = factor(EstimationMethod))) +
geom_freqpoly(alpha = 0.5, binwidth = 0.001) + geom_hdi(size = 2, lineend = "round",
alpha = 0.5, credible_mass = 0.95) + coord_cartesian(xlim = c(0, 0.5), ylim = c(0,
80)) + facet_grid(AnalysisRepetition ~ Group, scales = "free") + labs(title = expression(paste(alpha[mu],
" punishment rounds for DU model"))) + scale_color_discrete(guide = guide_legend(title = "Double Update Model..."))
TRUE
## [1] 0.1791246 0.3955309
## [1] 0.188777 0.235891
## [1] 0.1039649 0.2827806
## [1] 0.171925 0.222923
## [1] 0.1759508 0.3995315
## [1] 0.227383 0.277515
## [1] 0.1082205 0.8546446
## [1] 0.127602 0.175926
## [1] 0.1827635 0.4061411
## [1] 0.214157 0.266097
## [1] 0.1033157 0.2793651
## [1] 2.71836e-13 1.52072e-01
ggplot(m.mu.run1[Parameter == "beta" & ModelName == "double_update_notrialpost"],
aes(x = Value, fill = factor(EstimationMethod), color = factor(EstimationMethod))) +
geom_freqpoly(alpha = 0.5, binwidth = 0.001) + geom_hdi(size = 2, lineend = "round",
alpha = 0.5, credible_mass = 0.95) + # coord_cartesian(xlim=c(0,0.5),ylim=c(0,80))+
facet_grid(AnalysisRepetition ~ Group, scales = "free") + labs(title = expression(paste(beta[mu],
" punishment rounds for DU model"))) + scale_color_discrete(guide = guide_legend(title = "Double Update Model..."))
TRUE
## [1] 0.5721935 0.7926758
## [1] 0.593618 0.668746
## [1] 0.5313049 0.8094077
## [1] 0.691802 0.782384
## [1] 0.5733956 0.7946724
## [1] 0.692010 0.765647
## [1] 0.3170985 0.7835227
## [1] 0.598176 0.688743
## [1] 0.5773371 0.7971474
## [1] 0.704422 0.772544
## [1] 0.5333236 0.8086399
## [1] 0.00609996 0.50899800
From Figure @ref(fig:CompareVBAndMCMCReliability) it is immediately apparent that MCMC, as we’ve applied it estimates much wider HDIs than variational Bayes. We ran 2000 iterations: 1000 warmup iterations and 1000 estimation. It may be that estimation hadn’t stabilized after 1000 steps, and we need to estimate more warmup iterations, and we may need more estimation iterations, too. However, in most cases, MCMC estimates seem to be relatively stable across iterations. That may be an indication that the wide MCMC estimations correctly represent the data, and that the true posterior distribution for these data are relatively wide.
Examining the Variational Bayes results, there are real concerns about whether it has converged on appropriate results - it may be falling upon local minima, as can be seen below:
source("visualization/geom_hdi.R")
m.mu.run1 <- model.summary.all[Motivation == "Punishment" & Statistic == "mu" &
Run == 1]
# table(m.reward.mu.run1$ModelName) for clarity's sake...
# m.mu.run1$ModelName<-sub('^double_update$','DU with trial
# posteriors',m.mu.run1$ModelName)
# m.mu.run1$ModelName<-sub('^double_update_notrialpost$',' DU without trial
# posteriors',m.mu.run1$ModelName) plotly::ggplotly(p)
ggplot(m.mu.run1[ModelName == "double_update_notrialpost"], aes(x = Value, fill = factor(AnalysisRepetition),
color = factor(AnalysisRepetition))) + geom_freqpoly(alpha = 0.5, binwidth = 0.001) +
geom_hdi(size = 2, lineend = "round", alpha = 0.5, credible_mass = 0.95) +
coord_cartesian(ylim = c(0, 80)) + facet_grid(EstimationMethod ~ Group +
Parameter, scales = "free") + labs(title = expression(paste(beta[mu], " punishment rounds for DU model"))) +
scale_color_discrete(guide = guide_legend(title = "Analysis Repetition:"))
TRUE
## [1] 0.1791246 0.3955309
## [1] 0.1759508 0.3995315
## [1] 0.1827635 0.4061411
## [1] 0.5721935 0.7926758
## [1] 0.5733956 0.7946724
## [1] 0.5773371 0.7971474
## [1] 0.1039649 0.2827806
## [1] 0.1082205 0.8546446
## [1] 0.1033157 0.2793651
## [1] 0.5313049 0.8094077
## [1] 0.3170985 0.7835227
## [1] 0.5333236 0.8086399
## [1] 0.188777 0.235891
## [1] 0.227383 0.277515
## [1] 0.214157 0.266097
## [1] 0.593618 0.668746
## [1] 0.692010 0.765647
## [1] 0.704422 0.772544
## [1] 0.171925 0.222923
## [1] 0.127602 0.175926
## [1] 2.71836e-13 1.52072e-01
## [1] 0.691802 0.782384
## [1] 0.598176 0.688743
## [1] 0.00609996 0.50899800
The pattern here is that MCMC HDIs are much wider, but overlap from one iteration to the next. In contrast, our Variational Bayes HDIs, across repeated iterations, often entirely fail to overlap. Thus, it cannot be assumed that our Variational Bayes calculations are reliable, whereas our MCMC results, although they are insufficiently precise, seem to accurately capture results, with one or two exceptions.
Considering the wide HDIs produced by our MCMC estimates, I want to take a closer look to see if the MCMC estimates are representative, accurate, and efficient.
To assess Representativeness for an MCMC algorithm, we need to see whether chains have converged such that initial randomly-chosen priors are not related to final values. We can visually examine the trace plots below, and we can, examine teh density plots, and examine the Gelman-Rubin statistic (Gelman & Rubin, 1992) or the “potential scale reduction factor” or “shrink factor”. In \(rstan\), this is available as the statistic \(\widehat{r}\).
# class(model.stanfits[[5]])
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost") {
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
traceplot.title <- paste("group=", model.summaries[[i]]$g, model.summaries[[i]]$m,
model.summaries[[i]]$t, model.summary.all[, first(EstimationMethod),
by = TestId][i, V1], "vars=", length(names(model.stanfits[[i]])))
# cols.to.process<-[names(model.stanfits[[i]]) %in%
cols.to.process <- names(model.stanfits[[i]])[!sapply(names(model.stanfits[[i]]),
function(x) {
return(grepl("alpha\\[", x) || grepl("beta\\[", x) || grepl("alpha_pr",
x) || grepl("beta_pr", x) || grepl("log_lik\\[", x))
})]
# print(cols.to.process)
trace <- traceplot(model.stanfits[[i]], cols.to.process, alpha = 0.5,
inc_warmup = TRUE)
print(trace + labs(title = traceplot.title) + scale_color_discrete(guide = guide_legend(nrows = 1)) +
theme(legend.position = "bottom")) #+title(traceplot.title))
}
}
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
#
They look pretty stable, so we can’t complain about instability, though it is strange that there is such quick initial convergence followed by stable but widely differing chain patterns.
For the double update (no trial posterior) version iteration 2, we do see two parameters with one chain that failed to converge, although all other chains converged correctly.
# class(model.stanfits[[5]])
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost") {
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
traceplot.title <- paste("group=", model.summaries[[i]]$g, model.summaries[[i]]$m,
model.summaries[[i]]$t, model.summary.all[, first(EstimationMethod),
by = TestId][i, V1], "vars=", length(names(model.stanfits[[i]])))
# cols.to.process<-[names(model.stanfits[[i]]) %in%
cols.to.process <- names(model.stanfits[[i]])[!sapply(names(model.stanfits[[i]]),
function(x) {
return(grepl("alpha\\[", x) || grepl("beta\\[", x) || grepl("alpha_pr",
x) || grepl("beta_pr", x) || grepl("log_lik\\[", x))
})]
# print(cols.to.process)
stanplot <- plot(model.stanfits[[i]])
print(stanplot + labs(title = traceplot.title, x = "log value", y = "statistic")) #+scale_color_discrete(guide=guide_legend(nrows=1))+theme(legend.position = 'bottom'))#+title(traceplot.title))
}
}
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
#
Do we see overlap of the chains, or do they not overlap very much? Given the traceplot, I’d expect high levels of overlap.
# class(model.stanfits[[5]])
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost") {
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
traceplot.title <- paste("group=", model.summaries[[i]]$g, model.summaries[[i]]$m,
model.summaries[[i]]$t, model.summary.all[, first(EstimationMethod),
by = TestId][i, V1], "vars=", length(names(model.stanfits[[i]])))
# cols.to.process<-[names(model.stanfits[[i]]) %in%
cols.to.process <- names(model.stanfits[[i]])[!sapply(names(model.stanfits[[i]]),
function(x) {
return(grepl("alpha\\[", x) || grepl("beta\\[", x) || grepl("alpha_pr",
x) || grepl("beta_pr", x) || grepl("log_lik\\[", x))
})]
# print(cols.to.process)
stanplot <- stan_dens(model.stanfits[[i]], separate_chains = TRUE, show_density = TRUE,
show_outer_line = TRUE)
print(stanplot + labs(title = traceplot.title, x = "log value", y = "statistic")) #+scale_color_discrete(guide=guide_legend(nrows=1))+theme(legend.position = 'bottom'))#+title(traceplot.title))
}
}
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
Visually, these seem reasonably good, with the exception of Group 3, No Trial Posteriors. How do we calculate MCSE and ESS?
# class(model.stanfits[[5]])
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost") {
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
traceplot.title <- paste("group=", model.summaries[[i]]$g, model.summaries[[i]]$m,
model.summaries[[i]]$t, model.summary.all[, first(EstimationMethod),
by = TestId][i, V1], "vars=", length(names(model.stanfits[[i]])))
# cols.to.process<-[names(model.stanfits[[i]]) %in%
cols.to.process <- names(model.stanfits[[i]])[!sapply(names(model.stanfits[[i]]),
function(x) {
return(grepl("alpha\\[", x) || grepl("beta\\[", x) || grepl("alpha_pr",
x) || grepl("beta_pr", x) || grepl("log_lik\\[", x))
})]
# print(cols.to.process)
print(model.stanfits[[i]], pars = c("mu_p", "sigma"))
}
}
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -0.80 0.04 -0.88 -0.83 -0.80 -0.77 -0.71
## mu_p[2] -1.15 0.02 -1.18 -1.16 -1.15 -1.13 -1.11
## sigma[1] 1.00 0.06 0.89 0.96 1.00 1.04 1.12
## sigma[2] 0.13 0.02 0.10 0.12 0.13 0.14 0.16
##
## Approximate samples were drawn using VB(meanfield) at Wed Oct 4 19:06:09 2017.
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -0.85 0.05 -0.94 -0.88 -0.85 -0.81 -0.76
## mu_p[2] -1.05 0.02 -1.09 -1.06 -1.05 -1.03 -1.01
## sigma[1] 0.76 0.06 0.64 0.71 0.76 0.80 0.89
## sigma[2] 0.08 0.02 0.05 0.07 0.08 0.10 0.12
##
## Approximate samples were drawn using VB(meanfield) at Wed Oct 4 19:06:19 2017.
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_p[1] -0.57 0 0.17 -0.90 -0.68 -0.57 -0.46 -0.24 1601 1
## mu_p[2] -1.10 0 0.05 -1.20 -1.13 -1.09 -1.06 -1.00 2452 1
## sigma[1] 1.17 0 0.16 0.90 1.05 1.15 1.27 1.54 2677 1
## sigma[2] 0.25 0 0.05 0.16 0.21 0.24 0.28 0.35 1898 1
##
## Samples were drawn using NUTS(diag_e) at Wed Oct 4 19:21:59 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_p[1] -0.90 0 0.17 -1.25 -1.01 -0.89 -0.78 -0.56 1857 1
## mu_p[2] -1.11 0 0.07 -1.25 -1.15 -1.11 -1.07 -0.99 2739 1
## sigma[1] 0.86 0 0.15 0.60 0.75 0.84 0.95 1.20 2637 1
## sigma[2] 0.18 0 0.07 0.04 0.13 0.18 0.23 0.33 1270 1
##
## Samples were drawn using NUTS(diag_e) at Wed Oct 4 19:29:33 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -0.67 0.04 -0.75 -0.70 -0.67 -0.65 -0.59
## mu_p[2] -1.05 0.02 -1.09 -1.07 -1.05 -1.04 -1.02
## sigma[1] 1.02 0.06 0.91 0.98 1.01 1.05 1.13
## sigma[2] 0.14 0.02 0.11 0.12 0.14 0.15 0.17
##
## Approximate samples were drawn using VB(meanfield) at Fri Oct 6 03:02:53 2017.
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -1.04 0.05 -1.15 -1.08 -1.04 -1.01 -0.93
## mu_p[2] -1.13 0.02 -1.18 -1.15 -1.14 -1.12 -1.09
## sigma[1] 0.75 0.06 0.64 0.71 0.75 0.79 0.88
## sigma[2] 0.05 0.02 0.02 0.04 0.05 0.06 0.12
##
## Approximate samples were drawn using VB(meanfield) at Fri Oct 6 03:03:03 2017.
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_p[1] -0.58 0 0.17 -0.91 -0.69 -0.58 -0.47 -0.24 1575 1.00
## mu_p[2] -1.09 0 0.05 -1.20 -1.13 -1.09 -1.06 -1.00 2447 1.00
## sigma[1] 1.17 0 0.17 0.89 1.05 1.15 1.27 1.54 3155 1.00
## sigma[2] 0.25 0 0.05 0.16 0.21 0.24 0.27 0.35 1814 1.01
##
## Samples were drawn using NUTS(diag_e) at Fri Oct 6 03:32:52 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -0.64 0.32 0.62 -1.21 -0.98 -0.85 -0.68 1.08
## mu_p[2] -1.18 0.08 0.16 -1.52 -1.19 -1.12 -1.08 -1.00
## sigma[1] 50495.54 65953.98 123107.69 0.61 0.77 0.87 1.05 454751.79
## sigma[2] 0.19 0.01 0.07 0.05 0.15 0.19 0.24 0.32
## n_eff Rhat
## mu_p[1] 4 3.35
## mu_p[2] 3 2.76
## sigma[1] 3 2.65
## sigma[2] 34 1.07
##
## Samples were drawn using NUTS(diag_e) at Fri Oct 6 06:21:12 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -0.70 0.04 -0.79 -0.73 -0.70 -0.67 -0.62
## mu_p[2] -1.05 0.02 -1.08 -1.06 -1.05 -1.04 -1.02
## sigma[1] 1.03 0.07 0.90 0.98 1.03 1.07 1.16
## sigma[2] 0.14 0.02 0.10 0.13 0.14 0.15 0.19
##
## Approximate samples were drawn using VB(meanfield) at Sat Oct 7 04:01:29 2017.
## Inference for Stan model: double_update_notrialpost.
## 1 chains, each with iter=998; warmup=0; thin=1;
## post-warmup draws per chain=998, total post-warmup draws=998.
##
## mean sd 2.5% 25% 50% 75% 97.5%
## mu_p[1] -2.89 1.18 -5.35 -3.63 -2.85 -2.09 -0.61
## mu_p[2] -1.85 0.35 -2.56 -2.09 -1.85 -1.62 -1.14
## sigma[1] 0.69 0.80 0.09 0.26 0.45 0.80 3.18
## sigma[2] 0.31 0.23 0.07 0.16 0.25 0.41 0.96
##
## Approximate samples were drawn using VB(meanfield) at Sat Oct 7 04:01:38 2017.
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_p[1] -0.56 0 0.17 -0.90 -0.68 -0.56 -0.45 -0.23 1411 1
## mu_p[2] -1.09 0 0.05 -1.20 -1.13 -1.09 -1.06 -1.00 1888 1
## sigma[1] 1.16 0 0.16 0.89 1.05 1.15 1.26 1.53 3212 1
## sigma[2] 0.25 0 0.05 0.16 0.21 0.24 0.27 0.35 1702 1
##
## Samples were drawn using NUTS(diag_e) at Sat Oct 7 04:22:55 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## Inference for Stan model: double_update_notrialpost.
## 6 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu_p[1] -0.90 0 0.17 -1.24 -1.01 -0.89 -0.79 -0.56 1345 1
## mu_p[2] -1.11 0 0.07 -1.25 -1.15 -1.11 -1.07 -0.99 2115 1
## sigma[1] 0.85 0 0.15 0.60 0.74 0.83 0.94 1.19 2166 1
## sigma[2] 0.18 0 0.07 0.04 0.13 0.17 0.22 0.33 1118 1
##
## Samples were drawn using NUTS(diag_e) at Sat Oct 7 04:32:13 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
the Rhat values, which are Brooks-Gelman-Rubin statistics or “potential scale reduction factors” (Broooks & Gelman, 1998; Kruschke, 20xx) should be close to 1–and definitely within 10% of 1 (0.91 to 1.1). It appears that one MCMC failed to converge but others converged separately.
To assess the accuracy of the chains, we need to take into account the effective sample size (ESS)–how much independent data actually exists in our analysis. To do this, we need a measure of autocorrelation. From ESS, we can get a measure of Monte Carlo Standard Error.
We can view and measure autocorrelation in a number of ways: - To get the effective sample size, we can use rstan’s stan_ess; bayesplot also offers appropriate diagnostic tools. Kruschke (20xx) recommends an effective sample size of 10000 for estimating an HDI.
get.model.title <- function(ms, i) {
return(paste("group=", ms$g, ms$m, ms$t, model.summary.all[, first(EstimationMethod),
by = TestId][i, V1], "vars=", length(names(model.stanfits[[i]]))))
}
get.cols.to.process <- function(i) {
names(model.stanfits[[i]])[!sapply(names(model.stanfits[[i]]), function(x) {
return(grepl("alpha\\[", x) || grepl("beta\\[", x) || grepl("alpha_pr",
x) || grepl("beta_pr", x) || grepl("log_lik\\[", x))
})]
}
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost" & model.summary.all[,
first(EstimationMethod), by = TestId][i, V1] == "MCMC") {
print(model.summaries[[i]]$m == "double_update_notrialpost" & model.summary.all[,
first(EstimationMethod), by = TestId][i, V1] == "MCMC")
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
model.title <- get.model.title(model.summaries[[i]], i)
cols.to.process <- get.cols.to.process(i)
print(stan_ess(model.stanfits[[i]], pars = cols.to.process, separate_chains = TRUE))
}
}
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
# https://rdrr.io/cran/rstan/man/plotting-functions.html
# https://cran.r-project.org/web/packages/bayesplot/vignettes/visual-mcmc-diagnostics.html#effective-sample-size
Kruschke advocates for an ESS of 10,000 for values like 95% HDIs of posteriors. Considering that we have 1000 post-warmup iterations and 6 chains, that equals 6000 iterations and ESS/SS appears to be in the realm of 0.2-0.4. To get an ESS of 10,000, without optimizing the function further, we’d need an actual sample size of 25,000 to 50,000. Twelve chains of 4000 post-warmup iterations would get us 48,000, which seems like a good amount to aim for.
Monte Carlo Standard Error is calculated as
\[MCSE= ^{SD}/_{\sqrt{ESS}}\] or the standard formulation for standard error, but with effective sample size used in place of actual sample size.
We can view Monte Carlo Standard Error using the stan function stan_mcse.
for (i in 1:length(model.summaries)) {
if (model.summaries[[i]]$m == "double_update_notrialpost" & model.summary.all[,
first(EstimationMethod), by = TestId][i, V1] == "MCMC") {
print(model.summaries[[i]]$m == "double_update_notrialpost" & model.summary.all[,
first(EstimationMethod), by = TestId][i, V1] == "MCMC")
# only look at these because looking at models with trial posteriors is
# unnecessary and timeconsuming.
model.title <- paste0("Model MSCE:\n", get.model.title(model.summaries[[i]],
i))
cols.to.process <- get.cols.to.process(i)
print(stan_mcse(model.stanfits[[i]], pars = cols.to.process, separate_chains = TRUE) +
labs(title = model.title, x = "log value", y = "statistic"))
}
}
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
## [1] TRUE
TRUE
Efficiency simply describes how efficient the estimation algorithm is at calculating our model’s result. This includes not only measures like autocorrelation but also other model performance features that could potentially slow it down.
Efficiency is one of Turner’s key arguments for the use of DE-MCMC over other forms of MCMC, such as the Metropolis-Hastings algorithm (CITEREF:TURNER2013), or NUTS or Hamiltonian Markov Models (personal communication) as implemented in Stan.
MCMC proved to yield stable results that, after a 1000-iteration warm-up, with one notable exception, was representative, accurate, and somewhat efficient. Compared to vb, it yielded posteriors that were reliabile, but much less informative. Variational Bayes yielded posteriors that seemed informative, but across repetitions, could be shown to be completely unreliable.
Because variational Bayes is exponentially quicker - on the order of only a few seconds–it may still be valuable to attempt to find reliable results for variational Bayes.
The efficiency wasn’t too bad; an \(ESS/SS\) ratio of between 0.2 and 0.4 seems acceptable, but falls well short of Kruschke’s recommendation for a sample size of around 10000 to estimate an HDI. Thus, the sample size needs to be increased.
Increasing our sample size may also result in renegade chains, as was observed for one iteration here, finally resolving.
Thus, from here, I need to: